setdiff - Difference of Two Sets of Filenames

Mark Leighton Fisher on 2006-02-17T17:19:09

Try to use "some filenames except those that match this pattern" is a multi-step process in common shells. You have to generate the two lists of filenames, put those lists into separate files, then perform a "fgrep -v -f set2 set1" to get the list of files in the first set that are not in the second set.

setdiff is a small shell program for obtaining the difference of two sets of filenames. You use it by:

    setdiff FIRSTSET SECONDSET

where FIRSTSET is a quoted glob for the first set of filenames, and SECONDSET is a quoted glob for the second set of filenames. (You can also use actual filename lists in place of the quoted globs.) For example, to get all files that are not C source or header files in your current directory, you would use:

    setdiff '*' '*.h *.c'

which would print all filenames not ending in .h or .c to standard output.

A slightly more complicated example is finding out what files are not source code files in a project that is a mixture of Perl, C, and Java:

    setdiff '*' '*.h *.c *.pl *.pm *.java'

A final example – find the XML files that are not XSL files (*.xml, *.xsd, etc.) mixed in with a bunch of source code files:

    setdiff '/home/mycyc-0.22/*.x*' '/home/mycyc-0.22/*.xsl'

Here is the code:

#!/usr/bin/sh
# Output difference between two sets of filenames,
# i.e. the set difference of the the two filename sets.
# Names are assumed to be canonicalized already.
#
# This is the relative complement of B relative to A,
# also known as the set theoretic difference.
# Examples:
#   { 1, 2, 4 } - { 1, 2, 5} = 4
#   { 1, 2, 5 } - { 1, 2, 4} = 5


# check arguments
if [ "$1x" = "x" -o "$2x" = "x" ]; then
  echo usage: setdiff FILESETEXPR1 FILESETEXPR2
  exit 1
fi

# get a temporary filename for set #1
set1=`mktemp -t`
if [ "${set1}x" = "x" ]; then
  echo "can't get temporary filename for set1"
  exit 1
fi

# get a temporary filename for set #1
set2=`mktemp -t`
if [ "${set2}x" = "x" ]; then
  echo "can't get temporary filename for set2"
  exit 1
fi

# get the sets into temporary files
ls -1 $1 > $set1
ls -1 $2 > $set2

# compute all elements of set #1 not in set #2
fgrep -v -f $set2 $set1

By the way, setunion is setdiff, only with "fgrep -f $set1 $set2" at the end.


Korn shell

runrig on 2006-02-17T18:19:15

setdiff '*' '*.h *.c'


ls !(*.[ch])

zsh!

Dom2 on 2006-02-17T22:31:13

I love zsh. It supports the korn shell globbing above as well as:
% ls ^*.[ch]

zsg globbing manual

-Dom

comm

Dom2 on 2006-02-17T22:37:38

BTW, you probably also want to be using comm(1) instead of fgrep.
comm -23 $set1 $set2

Also, you should probably add a line to delete those temp files:

trap "rm -f $set1 $set2" EXIT HUP INT QUITE TERM
That way, they get deleted on exit, or if some common signal gets delivered.

-Dom